Connectivity of Growing Random Networks 
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A solution for the time- and age-dependent connectivity distribution of a growing random network 
is presented. The network is built by adding sites which link to earlier sites with a probability Ak 
which depends on the number of pre-existing links k to that site. For homogeneous connection 
kernels, Ak ~ k 1 , different behaviors arise for 7 < 1, 7 > 1, and 7 = 1. For 7 < 1, the number of 
sites with k links, Nk, varies as stretched exponential. For 7 > 1, a single site connects to nearly all 
other sites. In the borderline case Ak ~ k, the power law Nk ~ k" v is found, where the exponent v 
can be tuned to any value in the range 2 < v < 00. 

PACS numbers: 02.50.Cw, 05.40.-a, 05.50.+q, 87.18.Sn 



Random networks play an important role in epidemiol- 
ogy, ecology (food webs), and many other fields. The 
geometry of such fixed topology networks have been ex- 
tensively investigated J^-Q]. However, networks based 
on human interactions, such as transportation systems, 
electrical distribution systems, biological systems, and 
the Internet are open and continuously growing and new 
approaches are rapidly developing to understand their 
structure and time evolution M [12[ . 

In this Letter, we apply a rate equation approach to 
solve the growing random network (GRN) model, a spe- 
cial case of which was introduced in (l3) to account for 
the distribution of citations and other growing networks 
||l3| |l8f . Our approach is ideally-suited for the GRN and 
is much simpler than the standard probabilistic [jjj or 
generating function [|| techniques. The rate equation 
formulation can be adapted to study more general evolv- 
ing graph systems, such as networks with site deletion 
and link re- arrangement. 




FIG. 1. Schematic illustration of the evolution of the grow- 
ing random network. Sites are added sequentially and a single 
link joins the new site to an earlier site. 

The GRN model is defined as follows. At each time 
step, a new site is added and a directed link to one of 
the earlier sites is created. In terms of citations, we may 
interpret the sites in Fig. |l| as publications, and the di- 
rected link from one paper to another as a citation to 
the earlier publication. This growing network has a di- 
rected tree graph topology where the basic elements are 
sites which are connected by directed links. The struc- 
ture of this graph is determined by the connection kernel 
Ak , which is the probability that a newly-introduced site 



links to an existing site with k links (k — 1 incoming and 
1 outgoing). We will solve for the connectivity distribu- 
tion Nk(t), defined as the average number of sites with k 
links as a function of the connection kernel Ak- 

We focus on a class of homogeneous connection kernels, 
Ak = k 1 , with 7 > reflecting the tendency of prefer- 
ential linking to popular sites. As we shall show, the 
connectivity distribution crucially depends on whether 7 
smaller than, larger than, or equal to unity. For 7 < 1, 
the connectivity distribution decreases as a stretched ex- 
ponential in k. The case 7 > 1 leads to phenomenon akin 
to gelation in which a single "gel" site connects to 
nearly every other site of the graph. For 7 > 2, this phe- 
nomenon is so extreme that the number of connections 
between other sites is finite in an infinite graph. A power 
law distribution Nk ~ k~ v arises only for 7 = 1. In this 
case, finer details of the dependence of the connection 
kernel on k affect the exponent v. Hence we consider 
a more general class of asymptotically linear connection 
kernels, Ak ~ k as k — ► 00. We show that v is tunable 
to any value in the range 2 < v < 00. In particular, 
we can naturally generate values of v between 2 and 3, 
as observed in the web graph [^0|-p2[ and in movie actor 
collaboration networks [jl3| . 

The rate equations for the time evolution of the con- 
nectivity distribution Nk{t) are 



dt 



Mv 



fcl- 



(1) 



The first term accounts for the process in which a site 
with k — 1 links is connected to the new site, leading to 
a gain in the number of sites with k links. This happens 
with probability (fc- 1) 7 /M 7 , where M 7 (i) = ^fJ\Tj(t) 
provides the proper normalization. A corresponding role 
is played by the second (loss) term on the right-hand side 
of Eq. (Jl]). The last term accounts for the continuous in- 
troduction of new sites with no incoming links. 

We start by finding the low-order moments M n (t) of 
the connectivity distribution. Summing Eqs. (Q) over all 
k gives the rate equation for the total number of sites, 
M = 1, whose solution is M (t) = M (0) + t The 
first moment (the total number of bond endpoints) obeys 
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Mi = 2, which gives M x (t) = Mi(0) + 2t. The first two 
moments are therefore independent of 7, while higher mo- 
ments and the connectivity distribution itself do depend 
on 7. 

For the linear connection kernel, Eqs. ([!]) can be solved 
for an arbitrary initial condition. We limit ourselves to 
the most interesting asymptotic regime (t — ► 00) where 
the initial condition is irrelevant. Using Mi — 2t, we 
solve the first few of Eqs. (l|) and obtain Ni = 2t/3, N 2 = 
t/6, etc., which implies that the Nk grow linearly with 
time. Accordingly, we substitute N).(i) — tn k in Eqs. (|l|) 
to yield the recursion relation n k = n k -i(k — l)/(fc + ! 
Solving for n k then gives 



To complete the solution for the n&, we need to estab- 
lish the dependence of the amplitude fi on 7. Using the 
defining relation M~Jt — fi = X)fe>i k ln k, together with 
Eq. (Eh, we obtain the implicit relation for /it (7) 



En 

k=2j=2 



■it 



(6) 



fc(fc + l)(fc + 2) 



(2) 



To solve the model with a sub-linear connection kernel, 
< 7 < 1, notice that M 1 satisfies the obvious inequal- 



Despite the simplicity of this exact expression, it is not 
easy to extract explicit information except for the lim- 
iting cases 7 = and 7=1, where /i = 1 and /1 = 2 
respectively, and the corresponding connectivity distri- 
butions are given by n k — 2~ k and by Eq. (||). However, 
numerical evaluation shows that fi varies smoothly be- 
tween 1 and 2 as 7 increases from to 1 (Fig. |2|). This 
result, together with Eq. (^), provides a comprehensive 
description of the connectivity distribution in the regime 

< -V < 



ities M < Af 7 < Mi. Consequently, in the long-time < 7 < 1. It is worth emphasizing that for 0.8 < 7 < 1 



limit 



M 7 = fit, 1 < n < 2, 



(3) 



with a yet undetermined prefactor fi = ^(7). Now sub- 
stituting Nk{t) = trik and M 1 = fit into Eqs. (|l|) and 
again solving for n k we obtain 



ii 



k 

n f 



JL 



(4) 



whose asymptotic behavior is 



k 7 exp 



A; 2 exp 



fc 7 exp 



1-7 

1 — / ' 2 1-27 



±<7<U 



(5) 



< 7 < 



etc. This pattern in (^|) continues ad infinitum: When- 
ever 7 decreases below 1/m, with m a positive integer, an 
additional term in the exponential arises from the now 
relevant contribution of the next higher-order term in the 
expansion of the product in Eq. (Eh . 




FIG. 2. The amplitude fi in M 7 (i) = /it versus 7. 



rife depends weakly on 7 for 1 < k < 1000. Thus, it is 
difficult to discriminate between different 7's and even to 
distinguish a power law from a stretched exponential in 
the GRN model. This subtlety was already encountered 
in the analysis of the citation distribution 15 16[|. 



A striking feature of the GRN model is that we can 
"tune" the exponent v by augmenting the linear connec- 
tion kernel to the asymptotically linear connection kernel, 
with A k — > dock as k — > 00 , but otherwise arbitrary. For 
this asymptotically linear kernel, by repeating the steps 
leading to Eq. (H) we find 



JL 



k 

n 

i=i 



(7) 



Expanding the product in Eq. (|7|) leads to n k ~ k v with 
v = 1 + M/floo, while the amplitude /i is found from 



00 k 

*£II(i + t: 

k=2 j=2 



(8) 



As an explicit example, consider the connection kernel 
Ai = 1 and A k = a^fe for k > 2. In this case, we can 
reduce Eq. (||) to a quadratic equation from which we ob- 
tain v = (3 + \/l + 8/a O0 )/2 which can indeed be tuned 
to any value larger than 2. 

The GRN model with super-linear connection kernels, 
7 > 1, exhibits a "winner take all" phenomenon, namely 
the emergence of a single dominant "gel" site which is 
linked to almost every other site. A particularly singular 
behavior occurs for 7 > 2, where there is a non-zero prob- 
ability that the initial site is connected to every other site 
of the graph. To determine this probability, it is conve- 
nient to consider a discrete time version process where 
one site is introduced at each step which always links to 
the initial site. After N steps, the probability that the 
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new site will link to the initial site is N~< /(N + N" 1 ). This 
pattern continues indefinitely with probability 



V 



1 



+ JV 1 " 7 ' 



(9) 



Clearly, V = when 7 < 2 but V > when 7 > 2. Thus 
for 7 > 2 there is a non-zero probability that the initial 
site connects to all other sites. 

To determine the behavior for general 7 > 1, we need 
the asymptotic time dependence of ilf 7 . To this end, it 
is useful to consider the discretized version of the master 
equations Eq. (Q), where the time t is limited to integer 
values. Then Nk(t) — whenever k > t and the rate 
equation for Nk(k) immediately leads to 



N k (k) 



(fc-l)TJV fc _i(fc-l) 



M 7 (7c 
fc-i 



1) 



^2(2) n 



(10) 



From this and the obvious fact that Nk(k) must be less 
than unity, it follows that M~(t) cannot grow more slowly 
than t 7 . On the other hand, M 7 (t) cannot grow faster 
than t 1 as follows from the estimate 



fc=i 

t 

<^- 1 ^fciV fe (t) = r- 1 M 1 (t) (11) 



fc=i 



Thus M 7 cx i 7 . In fact, the amplitude of i 7 is unity as 
will be derived self-consistently after solving for the N^s. 

We now use M 7 ~ V in the rate equations to solve 
recursively for each Nk- Starting with the equation 
N% = 1 — iVi/M 7 , the second term on the right-hand 
side is sub-dominant; neglecting this term gives N\ = t. 
Continuing this same line of reasoning for each successive 
rate equation gives the leading behavior of Nk, 



N k = J k t k ~ {k ~ lh for k > 1, 



(12) 



with Jfe = rii = i J 7 /[l + j(l — 7)]- This pattern of behav- 
ior for Nk continues as long as its exponent k — (k — 1)7 
remains positive, or k < 7/(7 — 1). The full behavior 
of the Nk may be determined straightforwardly by keep- 
ing the next correction terms in the rate equations. For 
example, N x = t- t 2 ~ 7 /(2 - 7) + . . .. 

For k > 7/(7— 1), each Nk has a finite limiting value 
in the long-time limit. Since the total number of con- 
nections equals 2t and t of them are associated with N\ , 
the remaining t links must all connect to a single site 
which has t connections (up to corrections which grow 
no faster than sub-linearly with time) . Consequently the 
amplitude of M 7 equals unity, as argued above. 



Thus for super-linear kernels, the GRN undergoes an 
infinite sequence of connectivity transitions as a function 
of 7. For 7 > 2 all but a finite number of sites are linked 
to the "gel" site which has the rest of the links of the 
network. This is the "winner take all" situation. For 
3/2 < 7 < 2, the number of sites with two links grows 
as i 2 ~ 7 , while the number of sites with more than two 
links is again finite. For 4/3 < 7 < 3/2, the number of 
sites with three links grows as i 3-27 and the number with 
more than three is finite. Generally for < 7 < rjzr, 
the number of sites with more than m links is finite, while 
Nk ~ for k < m. Logarithmic corrections also 

arise at the transition points. 

The connectivity distribution leads to an amusing con- 
sequence for the most popular site. Its connectivity fc max 
is determined by J2k> kiaax ^fc = 1j that i s ' there is one 
site whose connectivity lies in the range (fc max ,oo). This 
criterion gives 

(lnt) 1 ^ 1 - 7 ) 0<7<1; 

asymptotically linear; (13) 
t super-linear. 

Since t also equals the total number of sites, we can com- 
pare this prediction about the most popular site with 
available data from the Institute of Scientific Informa- 
tion based on 783,339 papers with 6,716,198 total cita- 
tions (details in Ref. |L6]]). Here the most cited paper 
had 8,904 citations. This accords with the first line of 
Eq. (|13| ) for 7 w 0.86, and also with the second when 
v k 2.5. 

In addition to the connectivity of a site, we also may 
ask about its age. Within the GRN model, older sites 
should clearly be more highly connected. We quantify 
this feature and also determine how the connection ker- 
nel affects the combined age and connectivity distribu- 
tion. Note that our model does not have explicit aging 
where the connection kernel depends on the age of each 
site; this feature is treated in Ref. p7[ . 

Let Ck(t,a) be the average number of sites of age a 
which have k — 1 incoming links at time t. Here age a 
means that the site was introduced at time t — a. The 
quantity Cfe(t, a) evolves according to 



dc k dck_ _ J_ ... 
dt da ~ My [[ 



iy Ck -i- k^ck]+5ki5{a)- (14) 



The second term on the left-hand side accounts for the 
aging of sites, while the right-hand side accounts for the 
(age independent) connection changing processes. Con- 
sider first the linear kernel, Ak = k. Let us focus again 
on the most interesting limit, namely asymptotic behav- 
ior. Then we can disregard the initial condition and write 
Mi(i) = 2t. This transforms Eqs. Q into 



d d 
di + da-} Ck 



(k - l)c fc -i - fccfc 
2t 



S k i5(a). (15) 



The homogeneous form of this equation suggests that so- 
lution should be self-similar. Specifically, one can seek 
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a solution as a function of the single variable a/t rather 
than two separate variables, Ck(t, a) — f/Ja/t). This sim- 
plifies the partial differential equation ( |L5[ ) into an ordi- 
nary differential equation for fk{x) which can be easily 
solved. In terms of the original variables of a and t, we 
find 



Ck(t,a) 



1 



1 



1 



fc-i 



(16) 



Notice that this age distribution satisfies the normaliza- 
tion requirement, Nk(t) = L dack(t,a). As expected, 
young sites (those with a/t — > 0) typically have a small 
connectivity while old sites have large connectivity. Fur- 
ther, old sites have a broad distribution of connectivities 
up to a characteristic number which asymptotically grows 
as (k) ~ (1 — a/t) -1 / 2 as a — + t. These properties and re- 
lated issues may be worthwhile to investigate in citation 
and other information networks. 

Similarly, we can obtain Ck(t,a) for the GRN model 
with an arbitrary homogeneous connection kernel ]2]j] 
which grows slower than linearly in k. Assuming a self- 
similar solution Ck(t,a) — fk(a/t), applying a Laplace 
transform, we find a recursion relation for whose so- 
lution is identical in structure to Eq. (|4|). Although it 
appears impossible to perform the inverse Laplace trans- 
form in explicit form for arbitrary k, we can compute 
Cfc(t, a) for small k; for example, we find c\ = {l — a/t) 1 ^. 
The behavior also simplifies in the large-fc limit. Here we 
find that the age of sites with k links is peaked about the 
value afc which satisfies 




exp 



12 



(fc+3)(fc+4) 



7 < l; 
7 = 1. 



(17) 



This shows how old sites are better connected. 

In summary, we solved for both the connectivity dis- 
tribution and the age-dependent structure of the grow- 
ing random network. The most interesting connectivity 
arises in a network with an asymptotically linear connec- 
tion kernel. Here the number of sites with k connections 
has the power-law form ~ k ~ v , with v tunable to any 
value in the range 2 < v < oo. This accords with the con- 
nectivity distributions observed in various contemporary 
examples of growing networks. 
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